Capacitated clustering problem in computational biology: Combinatorial and statistical approach for sibling reconstruction
نویسندگان
چکیده
The capacitated clustering problem (CCP) has been studied in a wide range of applications. In this study, we investigate a challenging CCP in computational biology, namely, sibling reconstruction problem (SRP). The goal of SRP is to establish the sibling relationship (i.e., groups of siblings) of a population from genetic data. The SRP has gained more and more interests from computational biologists over the past decade as it is an important and necessary keystone for studies in genetic and population biology. We propose a large-scale mixed integer formulation of the CCP for SRP, that is based on both combinatorial and statistical genetic concepts. The objective is not only to find the minimum number of sibling groups, but also to maximize the degree of similarity of individuals in the same sibling groups Preprint submitted to Computers and Operations Research April 28, 2011 while each sibling group is subject to genetic constraints derived from the Mendel’s laws. We develop a new randomized greedy optimization algorithm to effectively and efficiently solve this SRP. The algorithm consists of two key phases: construction and enhancement. In the construction phase, a greedy approach with randomized perturbation is applied to construct multiple sibling groups iteratively. In the enhancement phase, a two-stage local search with a memory function is used to improve the solution quality with respect to the similarity measure. We demonstrate the effectiveness of the proposed algorithm using real biological data sets and compare it with state-of-the-art approaches in the literature. We also test it on larger simulated data sets. The experimental results show that the proposed algorithm provide the best reconstruction solutions.
منابع مشابه
Combinatorial Reconstruction of Half-Sibling Groups from Microsatellite Data
While full-sibling group reconstruction from microsatellite data is a well-studied problem, reconstruction of half-sibling groups is much less studied, theoretically challenging, and computationally demanding. In this paper, we present a formulation of the half-sibling reconstruction problem and prove its APX-hardness. We also present exact solutions for this formulation and develop heuristics....
متن کاملColumn-Generation Framework of Nonlinear Similarity Model for Reconstructing Sibling Groups
Establishing family relationships, such as parentage and sibling relationships, can be extremely important in biological research, especially in wild species, as they are often key to understanding evolutionary, ecological, and behavioral processes. Because it is often not possible to determine familial relationships from field observations alone, the reconstruction of sibling relationships oft...
متن کاملAn Imperialist Competitive Algorithm and a Mixed Integer Programming Formulation for the Capacitated Vehicle Routing Problem
The Vehicle Routing Problem (VRP), a famous problem of operation research, holds a central place in combinatorial optimization problems. In this problem, a fleet vehicles with Q capacity start to move from depot and return after servicing to customers in which visit only ones each customer and load more than its capacity not at all. The objective is to minimize the number of used vehicles and t...
متن کاملImproved K-Means Algorithm for Capacitated Clustering Problem
The Capacitated Clustering Problem (CCP) partitions a set of n items (eg. customer orders) into k disjoint clusters with known capacity. During clustering the items with shortest assigning paths from centroids are grouped together. The summation of grouped items should not exceed the capacity of cluster. All clusters have uniform capacity. The CCP is NP-Complete and Combinatorial optimization p...
متن کاملA Simulated Annealing Algorithm for Unsplittable Capacitated Network Design
The Network Design Problem (NDP) is one of the important problems in combinatorial optimization. Among the network design problems, the Multicommodity Capacitated Network Design (MCND) problem has numerous applications in transportation, logistics, telecommunication, and production systems. The MCND problems with splittable flow variables are NP-hard, which means they require exponential time t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers & OR
دوره 39 شماره
صفحات -
تاریخ انتشار 2012